sparsepca and amanpg find sparse loadings in principal component analysis (PCA) via an alternating manifold proximal gradient method (A-ManPG). Seeking a sparse basis allows the leading principal components to be easier to interpret when modeling with high-dimensional data. PCA is modeled as a regularized regression problem under the elastic net constraints (linearized L1 and L2 norms) in order to induce sparsity. Due to the nonsmoothness and nonconvexity numerical difficulties, A-ManPG is implemented to guarantee convergence.
The package provides a function for performing sparse PCA and a function for normalizing data.
The authors of A-ManPG are Shixiang Chen, Shiqian Ma, Lingzhou Xue, and Hui Zou. The Python and R packages are maintained by Justin Huang and Benjamin Jochem. A MATLAB implementation is maintained by Shixiang Chen.
The A-ManPG algorithm can be applied to solve the general manifold optimization problem
\[\begin{equation} \text{min}~F(A,B) := H(A,B)+f(A)+g(B)\text{ subject to (s.t.) }A\in\mathcal{M}_1, B\in\mathcal{M}_2 \end{equation}\]
where \(H(A,B)\) is a smooth function of \(A,B\) with a Lipschitz continuous gradient, \(f(\cdot)\) and \(g(\cdot)\) are lower semicontinuous (possibly nonsmooth) convex functions, and \(\mathcal{M}_1,\mathcal{M}_2\) are two embedded submanifolds in the Euclidean space.
For sparse PCA, the following function definitions are used:
where \(X\) is the \(n\times p\) data matrix or \(n\times n\) covariance matrix, \(A\) is the scores and \(B\) is the loadings, \(k\) is the rank of the matrices (in other words, how many principal components are desired), \(\lambda_1\) is the L1 norm penalty and \(\lambda_2\) is the L2 norm penalty. Both the L1 and L2 norm are used as elastic net regularization to impose sparseness within the loadings. Note that a different L1 norm penalty is used for every principal component, and the algorithm operates differently when the L2 norm penalty is set to a large constant (np.inf or Inf).
The A-ManPG algorithm uses the following subproblems with an alternating updating scheme to solve sparse PCA, computed in a Gauss-Seidel manner for faster convergence.
\[\begin{align} D_k^A := \operatorname{arg\,min}_{D^A} \langle\nabla_A H(A_k,B_k),D^A\rangle + f(A^k + D^A) + \frac{1}{2t_1}\|D^A\|^2_F~\text{ s.t. }~D^A\in\text{T}_{A_k}\mathcal{M}_1 \\ D_k^B := \operatorname{arg\,min}_{D^B} \langle\nabla_A H(A_{k+1},B_k),D^B\rangle + f(B^k + D^B) + \frac{1}{2t_2}\|D^B\|^2_F~\text{ s.t. }~D^B\in\text{T}_{B_k}\mathcal{M}_2 \end{align}\]
where \(A_{k+1}\) is obtained via a retraction operation (in this case, polar decomposition), \(t_1\leq L_A\), and \(t_2\leq L_B\). \(L_A\) and \(L_B\) are the least upper bounds of the Lipschitz constants for \(\nabla_A H(A,B)\) and \(\nabla_B H(A,B)\), respectively. The subproblems are solved using an adaptive semismooth Newton method.
Let \(\epsilon\) represent a tolerance level to detect convergence. An \(\epsilon\)-stationary point is defined as a point \((A, B)\) with corresponding \(D^A\) and \(D^B\) that satisfy the following:
\[\begin{equation} \|D^A/t_1\|_F^2+\|D^B/t_2\|_F^2 \leq \epsilon^2 \end{equation}\]
The algorithm reaches an \(\epsilon\)-stationary point in at most
\[\frac{2(F(A_0,B_0)-F^*)}{ ((\gamma\bar{\alpha}_1t_1+\gamma\bar{\alpha}_2t_2)\epsilon^2)}\]
iterations, where:
The following describes the algorithm used for solving the general manifold optimization problem using A-ManPG. For solving sparse PCA, the algorithm is implemented with the aforementioned definitions.
Input initial point (A0,B0) and necessary parameters for the required problem
for i=0,1,... do
Solve the first subproblem for Da
Set alpha = 1
while F(Retr(alpha * Da),B) > F(A,B) - alpha * norm(Da)^2 / (2 * t1) do
alpha = gamma * alpha
end while
Set A = Retr(alpha * Da)
Solve the second subproblem for Db
Set alpha = 1
while F(A,Retr(alpha * Db)) > F(A,B) - alpha * norm(Db)^2 / (2 * t2) do
alpha = gamma * alpha
end while
Set B = Retr(alpha * Db)
end for
Return A as the scores and B as the sparse loadings
To install the R package, install amanpg directly from CRAN.
install.packages("amanpg")
To install the Python package, use pip to obtain sparsepca from PyPI.
pip3 install sparsepca
spca.amanpg(z, lambda1, lambda2,
f_palm = 1e5, x0 = NULL, y0 = NULL, k = 0, type = 0,
gamma = 0.5, maxiter = 1e4, tol = 1e-5,
normalize = TRUE, verbose = FALSE)
spca(z, lambda1, lambda2,
x0=None, y0=None, k=0, gamma=0.5, type=0,
maxiter=1e4, tol=1e-5, f_palm=1e5,
normalize=True, verbose=False):
| Name | Python Type | R Type | Description |
|---|---|---|---|
z |
numpy.ndarray | matrix | Either the data matrix or sample covariance matrix |
lambda1 |
float list | numeric vector | List of parameters of length n for L1-norm penalty |
lambda2 |
float or numpy.inf | numeric or Inf | L2-norm penalty term |
x0 |
numpy.ndarray | matrix | Initial x-values for the gradient method, default value is the first n right singular vectors |
y0 |
numpy.ndarray | matrix | Initial y-values for the gradient method, default value is the first n right singular vectors |
k |
int | int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) |
gamma |
float | numeric | Parameter to control how quickly the step size changes in each iteration, default is 0.5 |
type |
int | int | If 0, b is expcted to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 |
maxiter |
int | int | Maximum number of iterations allowed in the gradient method, default is 1e4 |
tol |
float | numeric | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 |
f_palm |
float | numeric | Upper bound for the F-value to reach convergence, default is 1e5 |
normalize |
bool | logical | Center and normalize rows to Euclidean length 1 if True, default is True |
verbose |
bool | logical | Function prints progress between iterations if True, default is False |
Python returns a dictionary with the following key-value pairs, while R returns a list with the following elements:
| Key | Python Value Type | R Value Type | Value |
|---|---|---|---|
loadings |
numpy.ndarray | matrix | Loadings of the sparse principal components |
f_manpg |
float | numeric | Final F-value |
x |
numpy.ndarray | matirx | Corresponding ndarray in subproblem to the loadings |
iter |
int | numeric | Total number of iterations executed |
sparsity |
float | numeric | Number of sparse loadings (loadings == 0) divided by number of all loadings |
time |
float | numeric | Execution time in seconds |
Consider the two examples below for running sparse PCA on randomly-generated data: one using a finite \(\lambda_2\), and the other using a large constant \(\lambda_2\).
As with other libraries, begin by loading amanpg in R.
library(amanpg)
Before proceeding, it is helpful to determine a few parameters. Let the rank of the sparse loadings matrix be \(k=4\) (returning four principal components), the input data matrix be \(n\times p\) where \(n=1000\) and \(p=500\), \(\lambda_1\) be a \(4\times 1\) “matrix” where \(\lambda_{i,1}=0.1\), and \(\lambda_2=1\).
# parameter initialization
k <- 4
n <- 1000
p <- 500
lambda1 <- matrix(data=0.1, nrow=k, ncol=1)
lambda2 <- 1
For this example, the data matrix z is randomly generated from the normal distribution. Although it should be centered to mean 0 and normalized to Euclidean length 1, the function will automatically preprocess the input matrix when normalize=TRUE.
# data matrix generation
set.seed(10)
z <- matrix(rnorm(n * p), n, p)
as.data.frame(z)